European Bioconductor Meeting 2017, Cambridge, December 6, 2017

Goal

  • Package for visualization of high-dimensional dataset via dimension reduction methods → for exploratory analysis of genomic data in R: esetVis
  • input: Bioconductor expressionSet object
  • 3 multivariate projection methods:
    • spectral map (Lewi,P.J 1976): mpm package
      esetSpectralMap
    • T-Distributed Stochastic Neighbor Embedding (tsne) (Van der Maaten and Hinton 2008): Rtsne package
      esetTsne
    • linear discriminant analysis (Fisher, R. A. 1936): MASS package → esetLda
  • graphics libraries:
    • static with ggplot2 package
    • interactive: ggvis, rbokeh packages

Combine expressionSet + Grammar of Graphics

  • expressionSet
    • combine 3 (or more) layers for a experiment of genomic data in one R object:
      • data, e.g. change of gene expression upon certain condition
        → slot: assayData
      • feature annotation, i.e. symbol, probe set ID, family
        → slot: featureData
      • sample annotation, i.e. treatment, time point, concentration
        → slot: phenoData
    • efficient storage and easy accessibility
  • Easy aesthetic mapping of annotation variable via Grammar of Graphics (ggplot2/ggvis/rbokeh)

Static spectral map

esetSpectralMap(eset = ALL, 
    title = paste("Spectral map of acute lymphoblastic leukemia dataset"),
    colorVar = "BT", color = colorPalette, shapeVar = "sex", shape = c(15, 19), 
    sizeVar = "age", sizeRange = c(1,6), alphaVar = "remissionType",
    topGenes = 10, topGenesVar = "SYMBOL", topSamples = 15, topSamplesVar = "cod")

Pathway annotation

esetSpectralMap(eset = ALL, 
    title = paste("Spectral map of acute lymphoblastic leukemia dataset with pathway annotation"),
    geneSets = geneSets, geneSetsVar = "ENTREZID", topGeneSets = 5, topGenes = 0,
    colorVar = "group", color = c('B' = "dodgerblue", 'T' = "red3"), 
    alphaVar = "remissionType", shapeVar = "remissionType", topSamples = 0)

Interactive visualization

esetLda(eset = ALL, ldaVar = "BT", colorVar = "group", shapeVar = "sex",
    title = "Linear discriminant analysis", alphaVar = "stage",
    type = "interactive", figInteractiveSize = c(400, 400),
    interactiveTooltipExtraVars = varLabels(ALL), packageInteractivity = "rbokeh")

Implementation details
(Bioconductor development version)

  • new S4 class: esetPlot
    • contains:
      • feature and sample annotation: eSet
      • output from dimension reduction methods dataPlotSamples, dataPlotGenes (optional)
      • all plot specification details
    • child class for each plot type:
      ggplotEsetPlot, rbokehEsetPlot, ggvisEsetPlot (easily extendable)
    • dedicated plotEset function for visualization
  • advantages:
    • separate computation (– time) >< visualization
    • easy transfer to other graphic libraries/dimension reduction methods

Conclusion

Fisher, R. A. 1936. “The Use of Multiple Measurements in Taxonomic Problems” 7. Annals of Eugenics: 179–88.

Lewi,P.J. 1976. “Spectral Mapping, a Technique for Classifying Biological Activity Profiles of Chemical Compounds” 26. Arzneimittel Forschung (Drug Research): 1295–1300.

Van der Maaten and Hinton. 2008. “Visualizing High-Dimensional Data Using T-SNE.” Journal of Machine Learning Research, 2579–2605.